Goto

Collaborating Authors

 sickle cell disease


Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models

Vladika, Juraj, Dhaini, Mahdi, Matthes, Florian

arXiv.org Artificial Intelligence

The growing capabilities of Large Language Models (LLMs) show significant potential to enhance healthcare by assisting medical researchers and physicians. However, their reliance on static training data is a major risk when medical recommendations evolve with new research and developments. When LLMs memorize outdated medical knowledge, they can provide harmful advice or fail at clinical reasoning tasks. To investigate this problem, we introduce two novel question-answering (QA) datasets derived from systematic reviews: MedRevQA (16,501 QA pairs covering general biomedical knowledge) and MedChangeQA (a subset of 512 QA pairs where medical consensus has changed over time). Our evaluation of eight prominent LLMs on the datasets reveals consistent reliance on outdated knowledge across all models. We additionally analyze the influence of obsolete pre-training data and training strategies to explain this phenomenon and propose future directions for mitigation, laying the groundwork for developing more current and reliable medical AI systems.


RM-R1: Reward Modeling as Reasoning

Chen, Xiusi, Li, Gaotang, Wang, Ziqi, Jin, Bowen, Qian, Cheng, Wang, Yu, Wang, Hongru, Zhang, Yu, Zhang, Denghui, Zhang, Tong, Tong, Hanghang, Ji, Heng

arXiv.org Artificial Intelligence

Reward modeling is essential for aligning large language models with human preferences through reinforcement learning from human feedback. To provide accurate reward signals, a reward model (RM) should stimulate deep thinking and conduct interpretable reasoning before assigning a score or a judgment. Inspired by recent advances of long chain-of-thought on reasoning-intensive tasks, we hypothesize and validate that integrating reasoning capabilities into reward modeling significantly enhances RMs interpretability and performance. To this end, we introduce a new class of generative reward models - Reasoning Reward Models (ReasRMs) - which formulate reward modeling as a reasoning task. We propose a reasoning-oriented training pipeline and train a family of ReasRMs, RM-R1. RM-R1 features a chain-of-rubrics (CoR) mechanism - self-generating sample-level chat rubrics or math/code solutions, and evaluating candidate responses against them. The training of RM-R1 consists of two key stages: (1) distillation of high-quality reasoning chains and (2) reinforcement learning with verifiable rewards. Empirically, our models achieve state-of-the-art performance across three reward model benchmarks on average, outperforming much larger open-weight models (e.g., INF-ORM-Llama3.1-70B) and proprietary ones (e.g., GPT-4o) by up to 4.9%. Beyond final performance, we perform thorough empirical analyses to understand the key ingredients of successful ReasRM training. To facilitate future research, we release six REASRM models along with code and data at https://github.com/RM-R1-UIUC/RM-R1.


Improving Sickle Cell Disease Classification: A Fusion of Conventional Classifiers, Segmented Images, and Convolutional Neural Networks

Cardoso, Victor Júnio Alcântara, Moreira, Rodrigo, Mari, João Fernando, Moreira, Larissa Ferreira Rodrigues

arXiv.org Artificial Intelligence

Sickle cell anemia, which is characterized by abnormal erythrocyte morphology, can be detected using microscopic images. Computational techniques in medicine enhance the diagnosis and treatment efficiency. However, many computational techniques, particularly those based on Convolutional Neural Networks (CNNs), require high resources and time for training, highlighting the research opportunities in methods with low computational overhead. In this paper, we propose a novel approach combining conventional classifiers, segmented images, and CNNs for the automated classification of sickle cell disease. We evaluated the impact of segmented images on classification, providing insight into deep learning integration. Our results demonstrate that using segmented images and CNN features with an SVM achieves an accuracy of 96.80%. This finding is relevant for computationally efficient scenarios, paving the way for future research and advancements in medical-image analysis.


The 10 biggest science stories of 2023 – chosen by scientists

The Guardian

While western billionaires were busy sending rockets to space only for them to crash and burn, scientists in India were quietly doing something no one had accomplished before. Their Chandrayaan-3 moon lander was the first mission to reach the lunar south pole – an unexplored region where reservoirs of frozen water are believed to exist. I remember my heart soaring when images of the control room in India spread around social media, showing senior female scientists celebrating their incredible achievement. The success of Chandrayaan-3, launched in July 2023, showed the world that not only is India a major player in space, but that a moon lander can be launched successfully for $75m (£60m). This cost is not to be sniffed at but it is much cheaper than most other countries' budgets for a moon mission. July 2023 was an extremely busy month for space firsts.


The Download: Big Tech's AI stranglehold, and gene-editing treatments

MIT Technology Review

Until late November, when the epic saga of OpenAI's board breakdown unfolded, the casual observer could be forgiven for assuming that the ecosystem around generative AI was vibrant and competitive. But this is not the case--nor has it ever been. And understanding why is fundamental to understanding what AI is, and what threats it poses. Put simply, in the context of the current paradigm of building larger- and larger-scale AI systems, there is no AI without Big Tech. With vanishingly few exceptions, every startup, new entrant, and even AI research lab is dependent on these firms. Those with the money make the rules.


Pain Forecasting using Self-supervised Learning and Patient Phenotyping: An attempt to prevent Opioid Addiction

Padhee, Swati, Banerjee, Tanvi, Abrams, Daniel M., Shah, Nirmish

arXiv.org Artificial Intelligence

Sickle Cell Disease (SCD) is a chronic genetic disorder characterized by recurrent acute painful episodes. Opioids are often used to manage these painful episodes; the extent of their use in managing pain in this disorder is an issue of debate. The risk of addiction and side effects of these opioid treatments can often lead to more pain episodes in the future. Hence, it is crucial to forecast future patient pain trajectories to help patients manage their SCD to improve their quality of life without compromising their treatment. It is challenging to obtain many pain records to design forecasting models since it is mainly recorded by patients' self-report. Therefore, it is expensive and painful (due to the need for patient compliance) to solve pain forecasting problems in a purely supervised manner. In light of this challenge, we propose to solve the pain forecasting problem using self-supervised learning methods. Also, clustering such time-series data is crucial for patient phenotyping, anticipating patients' prognoses by identifying "similar" patients, and designing treatment guidelines tailored to homogeneous patient subgroups. Hence, we propose a self-supervised learning approach for clustering time-series data, where each cluster comprises patients who share similar future pain profiles. Experiments on five years of real-world datasets show that our models achieve superior performance over state-of-the-art benchmarks and identify meaningful clusters that can be translated into actionable information for clinical decision-making.


A Novel Deep Learning based Model for Erythrocytes Classification and Quantification in Sickle Cell Disease

Bhatia, Manish, Meena, Balram, Rathi, Vipin Kumar, Tiwari, Prayag, Jaiswal, Amit Kumar, Ansari, Shagaf M, Kumar, Ajay, Marttinen, Pekka

arXiv.org Artificial Intelligence

The shape of erythrocytes or red blood cells is altered in several pathological conditions. Therefore, identifying and quantifying different erythrocyte shapes can help diagnose various diseases and assist in designing a treatment strategy. Machine Learning (ML) can be efficiently used to identify and quantify distorted erythrocyte morphologies. In this paper, we proposed a customized deep convolutional neural network (CNN) model to classify and quantify the distorted and normal morphology of erythrocytes from the images taken from the blood samples of patients suffering from Sickle cell disease ( SCD). We chose SCD as a model disease condition due to the presence of diverse erythrocyte morphologies in the blood samples of SCD patients. For the analysis, we used 428 raw microscopic images of SCD blood samples and generated the dataset consisting of 10, 377 single-cell images. We focused on three well-defined erythrocyte shapes, including discocytes, oval, and sickle. We used 18 layered deep CNN architecture to identify and quantify these shapes with 81% accuracy, outperforming other models. We also used SHAP and LIME for further interpretability. The proposed model can be helpful for the quick and accurate analysis of SCD blood samples by the clinicians and help them make the right decision for better management of SCD.


The 10 biggest science stories of 2022 – chosen by scientists

The Guardian

The year opened with a bang. The successful film Don't Look Up, in which a comet is found to be on a collision course with Earth, had been released just before Christmas 2021. In the bleak days of post-festive gloom, the news media were on an adrenaline high, chasing any and every story about potential asteroid collisions to cheer us all up. Five asteroids were to pass close to the Earth in January alone! Happily for the health and wellbeing of humanity, none was predicted to come within a whisker of hitting the planet.


AI helps assess pain levels in people with sickle cell disease

New Scientist

AI algorithms can assess the pain that someone with sickle cell disease is experiencing by using just their vital signs. Doing so could ensure people receive the most suitable pain management therapy for their condition. "There's always a trade-off between giving people sufficient medicine to reduce the pain and giving people too much medication so that they have bad side effects or a higher risk of addiction," says Daniel Abrams at Northwestern University in Illinois. But since pain is subjective, it is difficult to measure in a standardised way. Abrams and his colleagues set out to determine whether physiological data that is already routinely taken – including body temperature, heart rate and blood pressure – could be used to devise a system that assesses pain levels in a more objective manner.


Pain Intensity Assessment in Sickle Cell Disease patients using Vital Signs during Hospital Visits

Padhee, Swati, Alambo, Amanuel, Banerjee, Tanvi, Subramaniam, Arvind, Abrams, Daniel M., Nave, Gary K. Jr., Shah, Nirmish

arXiv.org Artificial Intelligence

Pain in sickle cell disease (SCD) is often associated with increased morbidity, mortality, and high healthcare costs. The standard method for predicting the absence, presence, and intensity of pain has long been self-report. However, medical providers struggle to manage patients based on subjective pain reports correctly and pain medications often lead to further difficulties in patient communication as they may cause sedation and sleepiness. Recent studies have shown that objective physiological measures can predict subjective self-reported pain scores for inpatient visits using machine learning (ML) techniques. In this study, we evaluate the generalizability of ML techniques to data collected from 50 patients over an extended period across three types of hospital visits (i.e., inpatient, outpatient and outpatient evaluation). We compare five classification algorithms for various pain intensity levels at both intra-individual (within each patient) and inter-individual (between patients) level. While all the tested classifiers perform much better than chance, a Decision Tree (DT) model performs best at predicting pain on an 11-point severity scale (from 0-10) with an accuracy of 0.728 at an inter-individual level and 0.653 at an intra-individual level. The accuracy of DT significantly improves to 0.941 on a 2-point rating scale (i.e., no/mild pain: 0-5, severe pain: 6-10) at an intra-individual level. Our experimental results demonstrate that ML techniques can provide an objective and quantitative evaluation of pain intensity levels for all three types of hospital visits.